When presented with a data stream of two statistically dependent variables, predicting the future of one of the variables (the target stream) can benefit from information about both its history and the history of the other variable (the source stream). For example, fluctuations in temperature at a weather station can be predicted using both temperatures and barometric readings. However, a challenge when modelling such data is that it is easy for a neural network to rely on the greatest joint correlations within the target stream, which may ignore a crucial but small information transfer from the source to the target stream. As well, there are often situations where the target stream may have previously been modelled independently and it would be useful to use that model to inform a new joint model. Here, we develop an information bottleneck approach for conditional learning on two dependent streams of data. Our method, which we call Transfer Entropy Bottleneck (TEB), allows one to learn a model that bottlenecks the directed information transferred from the source variable to the target variable, while quantifying this information transfer within the model. As such, TEB provides a useful new information bottleneck approach for modelling two statistically dependent streams of data in order to make predictions about one of them.
translated by 谷歌翻译
Bloom过滤器是广泛使用的数据结构,可紧凑地表示元素集。查询Bloom过滤器会揭示基础集合中未包含元素还是在一定的错误率中包含。该会员资格测试可以建模为二进制分类问题,并通过深度学习模型解决,从而导致所谓的Bloom过滤器。我们已经确定,只有在考虑大量数据时,学到的Bloom过滤器的好处才有明显,即使那样,也有可能进一步减少其记忆消耗。因此,我们引入了一种无损输入压缩技术,该技术可以改善学习模型的记忆消耗,同时保留可比的模型精度。我们评估了我们的方法,并显示出对学习的Bloom过滤器的重大记忆消耗。
translated by 谷歌翻译
公民科学数据集可能非常大,并且有望改善物种分布建模,但是检测是不完美的,在安装模型时冒着偏见的危险。特别是,观察者可能无法检测到实际存在的物种。占用模型可以估计和纠正此观察过程,并且多种物种的占用模型利用了观察过程中的相似性,这可以改善稀有物种的估计值。但是,目前用于拟合这些模型的计算方法不能扩展到大型数据集。我们开发近似的贝叶斯推理方法,并使用图形处理单元(GPU)将多物种占用模型扩展到非常大的公民科学数据。我们将多物种占用模型拟合到来自eBird项目的一个月数据,该数据由186,811个清单记录组成,其中包括430种鸟类。我们评估了59,338条记录的空间分离测试集的预测,并比较了两种不同的推理方法 - 马尔可夫链蒙特卡洛(MCMC)和变异推理(VI) - 使用最大可能性分别拟合到每个物种的占用模型。我们使用VI将模型拟合到整个数据集中,并使用MCMC将多达32,000个记录拟合。安装在整个数据集中的VI表现最佳,在AUC上表现优于单物种模型(90.4%,而相比88.7%)和对数可能性(-0.080),而不是-0.085)。我们还评估了该模型预测的范围地图与专家图的一致。我们发现建模检测过程大大改善了一致性,并且所得的地图与使用高质量调查数据估计的图表与专家图密切一致。我们的结果表明,多物种占用模型是对大型公民科学数据集建模的令人信服的方法,并且一旦考虑到观察过程,它们就可以准确地对物种分布进行建模。
translated by 谷歌翻译